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ABSTRACT 


The  **TIS^  Intelligent  Gateway  Computer  at  the  Lawrence 
Livermore  National  Laboratory  provides  authorized  users  automated 
access  to  other  information  centers,  downloading  of  descriptive 
information  and  numerical  data,  and  post-processing  of  bibliographic 
citations.  Included  is  the  aggregation  of  extracted  information  into 
topical  files,  the  elimination  of  redundancy,  and  online  review  for  the 
creation  of  annotated  relevant  sets.  Post-processing  of  the  reviewed 
information  can  be  carried  out  by  permutation  of  titles,  abstracts,  and 
descriptors  with  statistics  (some  in  graphical  form)  of  their 
single/multi-term  expressions,  statistical  cross-correlation  of  data 
elements,  and  the  creation  of  concordances  and  indexes.  These  tools 
give  new  insigh  l  into  a  subject  matter  or  the  characteristics  of 
corporate/personal  publications.  These  self-guided  procedures  can  be 
performed  online  from  remote  terminals  by  telephone  dial-up, 
WATS-lines,  over  TYMNET,  and  via  the  ARPA  computer  network.  The 
'HS  Intelligent  Gateway  Computer  permits  the  linking  of  terminals 
among  users.  Information  specialists  and  information  requestors  may 
jointly  view  and  discuss  the  progress  of  an  interactive  search  and  its 
analysis  from  any  location.  Uncertain  legal  constraints  by  commercial 
information  vendors  limit  the  use  of  downloading  and  post-processing  at 
this  time  to  bibliographical  information  in  the  public  domain,  e.g. 
DOE/RECON., 


♦Work  performed  under  the  auspices  of  the  U.S.  Department  of  Energy  by  the  Lawrence 
Livermore  National  Laboratory  under  contract  number  W-7405-ENG-48. 


1.  INTRODUCTION 


More  than  1244  bibliographic  and  numeric  data  files  are  now  available  from  203 
online  information  vendors.fi]  This  makes  it  increasingly  difficult  to  identify  relevant 
citations  in  a  unified  manner  and  to  extract  meaningful  information  for  decision-making. 

Most  online  bibliographic  information  is  still  being  sold  by  offline  printing  following  a 
search.  At  best,  the  citations  are  shown  or  printed  in  chronologically  reverse  order  -  last 
publication  first.  When  a  search  is  carried  out  in  a  comprehensive  and  retrospective 
manner,  the  end-user  is  faced  with  piles  of  printouts  containing  redundant  citations  in 
different  formats  from  different  vendors.  Usually,  there  are  no  indexes  or  contents  lists 
to  the  returns.  Manual  review  and  organization  of  the  material  is  required.  Much  of  the 
information  thus  received  is  probably  being  discarded  unused. 

At  the  Lawrence  Livermore  National  Laboratory  (LLNL)  we  have  developed 
self-guided  programs  by  which  some  of  these  tasks  can  be  carried  out  automatically  with 
a  dedicated  information  machine,  the  Technology  Information  System  (TlS).[2-4]  This 
system,  under  development  and  use  since  1975,  contains  an  expanding  master  directory  to 
databases  of  other  information  centers.  Authorized  users  are  connected  to  the  named 
information  center  automatically  and  can  download  desired  information  to  TIS.  The 
resulting  files  can  then  be  post-processed  by  programs  that  permit  online  review,  the 
display  of  statistics,  the  creation  of  indexes  and  concordances,  and  text  analysis.  In  view 
of  the  uncertain  legal  and  monetary  implications  of  these  powerful  procedures,  we  have 
limited  our  applications  to  the  DOE/RECON  online  information  system  of  the  Department 
of  Energy,  and  are  exploring  its  extensions  to  other  federal  information  systems. 

2.  AUTOMATED  ACCESS  PROCEDURES 

We  are  developing  the  automated  and  transparent  access  procedures  to  different 
information  centers  as  part  of  the  prototyping  of  an  Intelligent  Gateway  Computer  (IGC). 
Users  of  TIS  may  consult  the  availability  of  programmatic  resources  stored  internally  for 
their  use,  or  made  available  to  them  by  external  information  sources. 


Users 

Intern, il  Information  Sources 

•  Directory  to  resources 

•  Programmatic  database 

•  Personal  com|Mitrr /database 
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Each  external  information  center  is  qualified  online  on  TIS  by  its  accredition,  the 
availability  and  cost  of  its  databases,  an  annotated  description  of  salient  commands  and 
prevailing  up-times.  This  ^formation  is  extracted  by  periodic  transfer  from  such  centers 
to  TIS  for  online  consultation  by  the  TIS  user  community  prior  to  their  use,  which  saves 
time  and  communication  costs.  Access  to  federal  and  commercial  resources  is  granted  to 
TIS  users  on  an  individual  basis  by  the  Database  Administrator,  where  appropriate. 
Authorized  users  gain  access  to  other  information  centers  simply  by  giving  the  command 
CONNECT,  followed  by  the  target  name  of  the  desired  resource:  e.g., 

CONNECT  DOE/RECON  12  will  establish  access  to  DOE/RECON  at  1200  baud. 

Alternately,  users  may  specify  the  TIS  option  number  of  the  desired  resource,  which 
is  part  of  each  online  menu.  In  either  case,  users  need  not  be  familiar  with  telephone 
dial-up  numbers,  accounts,  passwords,  or  peculiar  protocols. 

The  seven  main  user  communities  of  the  Technology  Information  System  establish 
their  own  views  of  their  internal  and  external  programmatic  resources,  in  a  self-guided 
manner,  without  programmer  intervention.  Database  Administrators  assigned  to  each 
user  community  control  access  rights  on  a  need-to-know/need-to-use  basis.  Individual 
users  see  only  those  resources  (e.g.  datafiles,  interactive  models,  graphs  and  reports)  to 
which  they  have  access.  An  exception  is  the  external  resources  we  advertise  to  promote 
their  use.  When  a  user  is  finished  using  a  TIS-provided  external  resource,  his  access  rights 
can  be  removed  by  the  TIS  Database  Administrator  and  no  change  of  passwords  is  required 
since  none  were  disclosed. 

The  DIAL  command  provides  an  equally  powerful, but  user-controlled,  method  for 
accessing  other  information  centers  and  computers.  In  this  case,  the  user  is  prompted  to 
specify  the  telephone  number,  baud  rate,  parity,  and  other  parameters,  i.e.: 


Telephone 

Baud  rate 

Parity 

Duplex 

Protocol 

1-300) 

l-o] 

t-h] 

I-b) 

DIAL 

[number] 

[-1200] 

I-e] 

[-fl 

t-v] 

TIS  then  establishes  the  communication  similar  to  an  automated  telephone  dialer. 
Users  have  to  provide  their  own  accounts  and  passwords  for  login  on  the  external  host 
machine.  Such  procedures  can  be  saved  for  personal,  routine  use. 


When  an  account  with  another  information  center  is  opened  for  TIS,  the  vendor  bills 
TIS  at  LLNL,  which,  in  turn,  deducts  the  appropriate  costs  from  the  responsible 
programmatic  accounts.  When  users  establish  their  own  accounts  with  information 


centers  and  use  them  via  TIS,  they  are  billed  directly  by  the  vendors,  who  cannot 


3.  DOWNLOADING  OF  INFORMATION 


The  user  can  initiate  downloading  of  information  from  another  system  to  TIS  in  two 
ways:  First,  the  SAVEON  option  permits  extraction  of  information  when  used  with  the 
CONNECT  command  discussed  previously.  In  this  case,  all  the  information  seen  on  the 
screen  during  one  session  is  placed  into  a  user-named  file. 


Approximately  100  citations  with  abstracts  can  be  extracted  and  saved  in  10  minutes 
at  1200  baud  by  asynchronous  telephone  dial-up.  Faster  transmission  is  possible  with 
9600-baud  dedicated  and  conditioned  synchronous  lines.  Second,  the  DIAL  command 
permits  extraction  and  downloading  into  one,  or  more  individual,  user-named  files  that 
can  be  opened  and  closed  at  liberty  by  special  control  characters  during  a  session,  e.g. 

ESCAPE  CTRL-A  —  prompts  the  user  for  a  file  name  and  saves  the  viewed 

information  therein.  An  additional  ESC-CTRL-A  closes 
the  file.  If  the  file  already  exists,  the  new  information  is 
appended  to  permit  progressive  creation  of  a  cumulative 
subject  datafile. 

ESCAPE  CTRL-B  —  sends  a  local  file  from  TIS  to  a  remote  machine.  This  has 

particular  importance  when  downloaded  and  saved 
information  is  to  be  transferred  to  more  powerful 
computers  for  analysis,  or  is  to  be  shared  with  someone 
else  via  TIS. 

Other  special  control  characters  permit  the  user  to  stop  the  viewing,  and/or  saving, 
of  information  and  to  address  the  local  or  remote  computer  selectively. 

The  legal  and  monetary  implications  of  downloading  and  sharing  information 
extracted  from  other  centers  must  be  considered  carefully. 


-  5- 


3.  POST-PROCESSING  OF  BIBLIOGRAPHIC  INFORMATION 

When  a  retrospective  search  is  carried  out  for  a  new  field  of  interdisciplinary 
research,  it  is  not  unusual  to  obtain  thousands  of  citations  from  different  information 
vendors,  in  different  formats,  with  redundant  overlap. 

Recently,  a  request  made  to  the  Transportation  Systems  Research  program  at  LLNL 
to  identify  foreign  R<5cD  in  electric  batteries  yielded  1.5  ft  of  printouts  from  federal  and 
commercial  information  vendors.  Carried  out  by  conventional  means,  the  2000  citations 
ourchased  at  a  cost  of  about  $1700  were  quite  useless.  It  was  very  difficult  to  convey  to 
DOE  headquarters  meaningful  statistics,  insights,  or  magnitudes  of  ongoing  R&D  abroad  in 
time.  It  took  six  days  alone  for  the  off-line  prints  to  arrive  by  mail.  One  solution  to  this 
problem,  at  least  for  databases  from  DOE/RECON,  is  the  post-processing  of  downloaded 
citations  with  TIS. 

TIS  offers  programs  for  the  archival  diskstorage  of  retrieved  information  in  a 
convenient,  user-defined  filing  system.  This  permits  results  to  be  organized  and 
aggregated  in  a  suitable  manner.  Redundant  citations  can  be  eliminated  by  their 
congruent  main  data  fields,  primarily  by  comparison  of  authors  and  titles.  The  resulting 
unique  set  is  then  reviewed  and  analyzed  online  by  self-guided  routines: 

*  REVIEW  citations  for  relevancy. 

*  DISPLAY  graphs  of  publication  rates. 

*  PERMUTE  multi-term  expressions  in  data  fields. 

*  CROSS-CORRELATE  contents  of  data  fields,  with  statistics. 

*  CONCORD  citations  by  author,  subject,  descriptors,  etc. 

The  REVIEW  command  permits  online  determination  of  relevancy.  Citations  are 
shown  on  the  screen  reformatted  by  accentuation  and  indentation  of  titles,  authors,  and 
abstracts.  This  renders  them  more  readable  than  citations  commonly  off  erred  by 
information  centers.  The  viewer  may  keep  or  discard  any  citation  shown  and  assign  his 
own  category  and  relevancy  code.  He  may  add  comments,  order  the  full-length  text,  and 
define  and  fill  new  data  fields  for  numeric  and/or  administrative  purposes.  Retained  and 
annotated  citations  are  saved  in  new  user-named  files.  Fields  defined  during  the  review 
process  can  subsequently  be  used  with  other  fields  for  post-processing. 


ON-LINE  DETERMINATION  OF  RELEVANT  SET:  li« 
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The  DISPLAY  of  the  publication  pate  for  a  particular  search  topic,  an  institute,  or 
author  provides  an  immediate  indication  of  the  effort  or  growth  in  their  field  of  activity. 
It  is  probably  the  first  display  an  end-user  may  wish  to  see  and  is  carried  out  on  TIS  by  the 
simple  command  YEAR-GRAPH.  In  most  cases,  as  shown  in  the  example  below,  we  see  an 
apparent  decline  after  a  sharp  rise  in  the  publication  rate.  This  decline  is  predominantly 
the  lag  in  time  between  the  publication  of  the  primary  literature  and  its  entry  into  the 
secondary  online  database  holdings.  To  appreciate  the  above  average  increase  in 
publications  in  a  particular  field,  one  has  to  compare  it  with  the  total  annual  increase  in 
the  publication  rate.  In  the  sciences,  this  rate  is  now  about  13%. [6] 


DOE/RECON  Data  Rnalysis  Program 
Humber  of  Publications  per  Year 


The  most  significant  aspect  of  post-processing  is  probably  the  time-dependent 
change,  or  momentum,  of  a  particular  field  in  R&D,  derived  from  the  statistics  of  its 
permuted  descriptive  terms.  The  PERMUTE  command  of  our  post-processing  routines 
provides  this  option  by  counting  the  number  of  times  a  specified  term  appears  in  the 
message-carrying  fields  of  citations,  like  the  title,  abstract,  descriptor,  category,  etc. 
This  is  done  by  analyzing  single  and  compound  expressions  containing  up  to  four  terms 
[e.g.,  solar  energy  conversion  experiments].  All  compound  expressions  of  this  type 
appearing  in  the  selected  data  fields  are  presented  to  the  viewer  online,  alphabetically 
ordered,  with  their  frequency  of  occurence.  The  tables  following  show  this  for  two  recent 
projects  carried  out  by  the  Research  Information  Group  of  the  Technical  Information 
Department  at  LLNL  for  DOE  patents  and  the  DOE  flywheel  program. 
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CROSS-CORRELATIONS  of  expressions  between  any  two  fields  of  the  citations  can 
provblo  new  insight.  For  example,  by  cross-correlating  authors,  we  can  see  at  a  glance 
who  is  working  with  whom.  A  cross-correlation  of  the  author  field  with  the  descriptor 
field  shows,  in  alphabetic  order,  the  statistics  of  indexing  terms  assigned  to  the  work  of  a 
particular  person  for  his  entire  professional  career.  When  carried  out  in  yearly 
increments,  this  routine  can  be  used  to  judge  the  change  of  emphasis  with  time. 
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CONCORDANCES  generated  by  author,  descriptor,  corporate  author,  or  country 
vield  succinct  listings  of  bibliographic  citations  in  a  particular  field.  These  alphabetical 
indexes  are  similar  to  those  commonly  produced  as  look-up  tables  for  authors  or  subjects. 
In  this  case,  they  are  created  at  the  pleasure  of  the  user,  online,  on  the  contents  of  any 
citation  field. 
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Extracts  from  concordances  of  DOE  patents  by  authors  and  descriptors. 
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Output  from  the  post-processing  routines  can  be  saved  in  files  for  subsequent  use,  or 
for  transfer  to  electronic  word  processors  and  merging  with  reports,  or  can  be  sent  to 
typesetters  as  camera-ready  copy  for  publication.  A  full  description  of  present  TIS 
post-processing  capabilities  has  been  published.[7]  Similar  approaches  are  being 
developed  elsewhere.[8] 

5.  CONCLUSION 

Downloading  and  post-processing  of  bibliographic  citations  and  numeric  data  offer 
the  information  specialist  powerful  and  cost-effective  tools  for  the  repackaging  of  search 
results  and  the  delivery  of  more  relevant  information  products.  At  present,  our  work  at 
LLNL  is  concentrated  on  applying  and  refining  these  tools  for  DOE/RECON  and  is  being 
sponsored  by  the  Department  of  Energy  Technical  Information  Center  (DOE/TIC). 
However,  the  response  has  been  so  favorable  that  we  have  been  asked  to  explore  the 
possibilities  of  extending  these  capabilities  to  databases  of  other  federal  information 
systems.  This  extension  requires,  where  possible,  unified  command  languages  and  the 
reformatting  of  retrieved  citations. 

These  TIS  capabilities  have  been  demonstrated  by  the  NASA  Industrial  Applications 
Center  at  the  University  of  Southern  California,  where  information  specialists  linked  their 
terminals  via  TIS  to  clients'  terminals  elsewhere  in  the  country  while  conducting  search 
and  post-processing  with  databases  from  DOE/RECON,  and  NASA/RECON. [9, 10]  This 
lingkage  provided  simultaneous  viewing  and  voice  communication,  and  instant  delivery  of 
the  refined  product  to  the  end-user,  thereby  speeding  substantially  the  timely  delivery  of 
information  products. 

Downloading  and  postprocessing  of  bibliographic  information  is  being  developed  at  LLNL 
to  improve  the  transfer  of  government-sponsored  technology  among  federal  agencies  and 
to  industry.U  1,12]  Similar  post-processing  routines  are  being  developed  throughout  the 
information  industry.  Commercial  database  producers  and  information  vendors  must 
arrive  at  practical  solutions  for  the  use  of  such  technological  innovations.!  13] 
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